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effect sizes when submitting manuscripts for publication since some (and 
potentially more) journals in the education field require, or strongly 
encourage, authors to report magnitude of effect measures with their 
statistical interpretation discussion. Therefore, it is important that 
researchers familiarize themselves with various effect size measures and how 
to interpret them. This paper reviews many of the effect size statistics 
available, including both corrected and uncorrected measures such as eta 
squared, omega squared, epsilon squared, the Wherry formula, the Herzburg 
formulas (both fixed and random models), and the Lord formula. These 
statistics are explained and demonstrated with heuristic data. Several 
recommendations are presented to facilitate appropriate interpretation of 
effect sizes, including the consideration of estimate bias, fixed or random 
effects design use, and whether analyses were conducted using univariate or 
multivariate statistical approaches. One table shows actual calculations 
using general linear model analyses. (Contains 42 references.) (SLD) 
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Abstract 

Best practice is now understood by many to include reporting effect sizes when 
submitting manuscripts for publication since some (and potentially more) journals in the 
education field require (or strongly encourage) authors to report magnitude of effect 
measures with their statistical interpretation discussion. Therefore it is important that 
researchers familiarize themselves with various effect size measures and how to interpret 
them. This paper reviews many of the effect size statistics available, including both 
corrected and uncorrected measures such as eta squared, omega squared, epsilon squared, 
the Wherry formula, the Herzburg formulas (both fixed and random models), and the 
Lord formula. These statistics are explained and demonstrated with heuristic data. 
Furthermore, several recommendations are presented to facilitate appropriate 
interpretation of effect sizes including the consideration of estimate bias, fixed or random 
effects design use, and whether analyses were conducted using univariate or multivariate 
statistical approaches. 
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Understanding and Interpreting Effect Size measures in General Linear Model Analyses 

Introduction 

When reporting the statistical significance of research findings, investigators should 
include in their discussion the “meaningfulness” of the outcomes of their research. Thomas and 
Nelson (1996) defined meaningfulness as the “importance or practical significance of an effect or 
relationship” (p.109). Many authors and the APA Publication Manual (American Psychological 
Association, 1994; Cohen, 1990; Rosnow & Rosenthal, 1989; Thomas, Salazar, & Landers, 

1991; Thompson, 1994) indicated the need to provide an acceptable estimate of 
“meaningfulness” with all tests of statistical significance. These statements are predicated upon 
one of the most important issues in statistics which is the use of reasonable judgement in 
interpreting statistical findings. 

“Meaningfulness” judgements made by researchers must be based upon the context of 
theory, previous work, and a sound understanding of relevant basic concepts which undergird the 
information needed to determine whether a study’s outcomes have merit. Statistically speaking, 
several important issues which should be considered in this merit interpretation involve the 
relationship between alpha, statistical power, sample size, and effect size (Kraemer & Thiemann, 
1987; Murray & Dosser; 1987; Thompson, 1989; Thompson, 1994; Thompson, 1997). 

Magnitude of Effect Measures 

* 

Over the past three decades, researchers (Bakan, 1966; Cohen, 1969; Cohen, 1990; 
Cooper & Hedges, 1994; Glass, McGraw & Smith; 1981; Hedges & Olkin, 1985; Kupfersmid, 
1988; Thomas & French, 1986; Thompson, 1989) have argued that one of the most effective 
methods for interpreting the true meaningfulness of a reported “statistically significant” 
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difference between means (traditional, comparative study) or a relationship (correlation/ 
regression-based study) is the calculation and reporting a research study’s effect size (i.e. 
determining the magnitude of the effect.). The previously noted researchers have supported the 
inclusion magnitude of effect measures in all research presented and/or published for which these 
measures are relevant. Thomas and Nelson (1996) and Hedges and Olkin (1985) illustrated 
examples of how small differences/relationships can be interpreted as statistically significant 
based upon the presence of large sample sizes, or conversely, large differences/relationships can 
be declared statistically non-significant due to small sample sizes. Indeed, Cohen (1990) pointed 
out that the focused purpose of research should be to measure the magnitude of effect rather than 
traditionally reported “statistical significance” which relies on p values. Carver (1978), Chow 
(1988), Franks and Huck( 1986), Huberty (1987), Thomas, Salazar, and Landers (1991) all 
have indicated that a single study resulting in the dichotomous decision of “failing to reject” or 
“rejecting” a null hypothesis at a predetermined alpha value level, demonstrates minimal impact 
on theory development, or practice. On the other hand, reporting of estimates of the magnitude 
of effect observed may offer comparative standards with past and present research, and assist the 
researcher in identifying important characteristics for subsequent follow-up research. 

Carver (1978) and Rosnow and Rosenthal (1989) pointed out that one of the primary 
contributions of a magnitude of effect statistic is providing the reader with a sense of how much 
of the dependent variable can be controlled, predicted, or explained by the independent 
variable(s). In behavioral research, investigators are often interested in determining the amount 
of variance accounted for or explained by the presence of another variable (Chow, 1988; 

O’Grady, 1982). The use of estimates regarding the magnitude of an effect can assist the 
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researcher in determining whether statistically significant findings are of any meaningful 
significance in the world of professional practice. Snyder and Lawson (1993) illustrated this 
point with an excellent example: 

For example, use of an instructional method that increases the performance of an 
experimental group on a dependent measure by 5 points over a control group will 
result in statistically significant findings, if the sample size is large enough. Whether 
or not such a 5-point difference (i.e. magnitude of effect) between groups is 
meaningful from an instructional standpoint depends on many factors besides the 
statistically significant p value, (p.335) 

Snyder and Lawson also pointed out that it is important for research to avoid misinterpretation of 
a small p value. A relatively small p value does not necessarily mean that there is a strong 
relationship between the independent and dependent variables of interest in a study. 

A concern which has been expressed by many in the behavioral sciences, particularly in 
the fields of health, physical education, and recreation (Franks & Huck, 1986; Looney et al, 

1994; Thomas & French, 1986; Thomas & Nelson, 1996; and Thomas, Salazar, & Landers, 

1991), is the apparent lack of attention researchers in these fields demonstrate by omitting 
discussion of effect size in their papers, or even including in their reporting components 
necessary for accurate estimation of effects size such as sample sizes, Ms and SDs, and full 
disclosure of all factors/variables within the design. Thomas and Nelson (1996) pointed out that 
such failures to report these data not only make accurate estimation of effect size impossible, but 
also create great difficulties for other researchers in making comparisons from past to future 
research and in conducting meta-analyses. Thomas, Salazar, and Landers (1990) aptly 
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summarized this perspective, “Fundamental to the reporting of good science is information for 
comparison to other work; at a minimum that includes the Ms and SDs of the variables of interest 
(main effects and interactions)” (P.346). Looney, Feltz and VanVleet (1994) concurred with 
their health, physical education and recreation colleagues. 

Post-hoc estimates of effect size (ES) and omega squared (go 2 ) provide methods which 
can assist in the evaluation of meaningfulness. A standardized effect size such as Cohen’s d 
(ES = [ M r M 2 ]/s ) may be used to represent an estimate of meaningfulness concerning the 
difference between two levels of comparison (groups, for example) very much like a follow-up 
comparison (t test), while omega squared (go 2 ) may be used to estimate the proportion of total 
variance accounted for by the independent variable(s). Kraemer and Thiemann, (1987) discussed 
the use of a priori, post hoc, or both methods of analysis in using ES, co 2 , and the statistical 
indices of power and alpha in the interpretation of a study’s “meaningfulness”. 

Importantly, Cohen (1988) suggested that a useful a priori procedure is the calculation of 
power for each of the statistical procedures planned for application within a study. The 
computation of power requires information on three of four indices; alpha, beta, sample size, and 
effect size. Researchers can also estimate the sample size needed to detect a certain effect (ES) 
given a specific alpha level, beta level, and anticipated effect size (Kraemer & Thiemann, 1987). 
It is possible, of course, that a valid a priori estimate of ES is not available, and power cannot be 
calculated. Whether this is the case or not, Cohen (1988) argued it is indeed useful to calculate a 
post hoc estimate of ES and co 2 for comparison of greatest interest. Thomas, Salazar, and 
Landers (1991) argued that not only should magnitude of effects be calculated, but that they 
should be interpreted for the reader (e.g. ,2=small, .5 = moderate, .8 = large for standardized 
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differences [Cohen, 1988]) and compared to interpreted findings in relevant studies. 

Magnitude-of Association Measures 

Several authors have noted that researchers use many different terms when discussing 
magnitude of effect estimates (Hedges, 1982; Hedges, 1984; Franks & Huck, 1986;; Murray & 
Dosser, 1987; Murphy, 1997; O’Grady, 1982; Thomas & French, 1986). These labels may 
include; (a) magnitude of effect, (b) magnitude of experimental effect, (c) explained variance, 
(d) effect size, (e) strength of relation, and (f) strength of association. Maxwell and Delaney 
(1990) and Thomas, Salazar, and Landers (1991) discussed the organization of magnitudes-of- 
effect indices into two broad categories; measures of effect size and measures of association 
strength. The first category includes indices which directly involve investigation of differences 
between means. Many of these indices include estimates typically used in meta-analysis 
techniques such as; (a) mean differences, (b) effect parameter measures, and (c) standardized 
differences between means, such as Cohen’s d (Cohen, 1988). There exists numerous works 
which discuss effect sizes for mean differences, including the necessary calculations (Camp & 
Maxwell, 1983; Cooper & Hedges, 1994; Hedges & Olkin, 1985; Maxwell & Delaney, 1990; 
O’Grady, 1982; Thomas & Nelson, 1996). Interested readers are referred to the previously 
mentioned citations for a thorough review of calculations regarding these effect sizes. 

Since all analytical procedures are inherently correlational in nature (Cohen, 1968; 

Cohen & Cohen, 1983; Hedges & Olkin, 1983; Knapp, 1978; Thompson, 1991) the focus of the 
present paper will be on the discussion of issues regarding magnitude-of-association. Indices 
within this second broad category can be used on data from either experimental or correlational 
design frameworks. 
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Magnitude-of-association indices attempt to elucidate association strength between 
variables (e.g. proportionality of variance associated between dependent and independent 
variables). Snyder and Lawson (1993) pointed out several magnitude-of-association measures 
which have been developed over the past 40 years in attempts to address the interpretation of 
“meaningfulness” issue, including eta squared, partial eta squared (Cohen, 1973), omega squared, 
epsilon squared, R 2 (Stevens, 1992), partial R 2 (Cohen & Cohen, 1983), the Wherry formula, 
the Herzberg formula, and the Lord formula. Snyder and Lawson noted that the literature 
covering magnitude-of-association indices generally includes discussions of the following 
categories; (a) biased or unbiased computations, (b) whether the indices are based upon 
population or sample calculation estimates, (c) fixed or random effect model utilization, and (d) 
univariate or multivariate statistical analyses. As Snyder and Lawson observed, these various 
indices of effect are all part of the same general linear model (Henson, in press) and therefore are 
similar conceptually and computationally. Researchers are encouraged to use such indices as 
appropriate within a given research design context. 

Computing Magnitude-of Association Measures : Some Considerations 
Biased and Unbiased Estimates : 

Thompson (1990) noted that eta squared (ANOVA) and R 2 (regression), both of which 
express ratios of explained variance to total variance, tend to overestimate the proportion of 
variability explained in the population or future samples. Discussion offered by Stevens (1992) 
concurred with this perspective indicating that simple magnitude-of-association indices may 
cause overestimates due to the “mathematical maximization principle” operating in general linear 
model analyses. Variation due to sampling error can potentially cause simple magnitude-of- 
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association indices to be positively biased. As such, these estimates are often called biased or 
uncorrected effect size measures. Stevens (1992) and Thompson (1997) both noted that even if 
no relationship existed between X and Y variables within a population ( R= 0), the sample R 2 or 
eta squared would probably not be exactly zero. Thompson (1994) indicated that three 
characteristics of a research study may impact the amount of positive bias in an uncorrected 
magnitude of association indices, including ( a) sample size (larger sample/less bias), (b) number 
of variables (fewer variables/less bias), and (c) true population effect size (effect size larger/less 
bias). 

In an attempt to statistically correct for potential positive bias in the calculation of 
magnitude-of-association indices, statisticians have developed “unbiased” or “corrected” 
formulas. Synder and Lawson (1993) provided an excellent summary of both corrected and 
uncorrected formulas including R 2 and eta squared, omega squared (fixed & random effects), 
epsilon squared, Wherry, Herzberg (fixed and random effects) and Lord formulas. The authors 
indicated that the Herzberg and Lord are designed to correct for estimates potentially realized in 
future samples, and these indices will always be smaller than the other formulas discussed. 

These two estimates will result in the most shrinkage in the size of the magnitude-of-association 
estimate because they adjust for sampling error in both a given research study and some future 
research study. Omega squared, epsilon squared, and the Wherry formula are recommended for 
use in research designed to develop population expectations, since these are designed to estimate 
association strength most likely to be realized in the population. 

Fixed or Random-effect Design Models : 

Factors in a fixed-effect model ANOVA design and values within a fixed-effects 
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regression model design are assumed as just that, fixed, i.e. they do not change. When factors are 
randomly selected in ANOVA designs, or when predictors are randomly selected in a regression 
model design, they can change from one replication of the study to another. Therefore, the 
potential increase in sampling error (with people or variable levels) should be corrected for in the 
magnitude-of-association estimate calculations. Stevens (1992) presented several different 
formulas for the Herzberg correction for fixed and random design models. Snyder and Lawson 
(1993) and Thomas and Nelson, (1996) included an omega squared formula in their discussions 
of effect size computational formulas. Murray and Dosser (1987), Thomas and Nelson (1996) 
and Tolson (1980) have emphasized that investigators must pay attention to the underlying 
assumptions of various models during the selection process of a bias-correction formula for 
magnitude-of-association estimates. 

Univariate or Multivariate Estimates: 

Most formulas discussed by Snyder and Lawson (1993), Stevens (1992) and others 
discussed thus far were developed from a univariate perspective. However, Huberty (1972), 
Thomas and Nelson (1996) and Tolson (1980)) described formulas which may be used in 
multivariate cases. Stevens (1992) noted that the multivariate D 2 (mean vectors and sample 
covariance matrix S ) can replace their univariate counterparts d and s. Pedhazur (1982) 
discussed 1 -lambda as analogous to the univariate R 2 or eta squared. 

Perspectives of the General Linear Model 

Maxwell, Camp, and Avery (1981) noted that magnitude-of-association strength 
estimates developed from a particular GLM perspective may be expected to produce similar 
statistical values to other techniques that essentially come from the same source. For example. 
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epsilon squared (fixed effects ANOVA) will be numerically equal to the Wherry formula 
(regression). Omega squared (random effects ANOVA) will equal the squared intra-class 
correlation (correlation model) (Snyder & Lawson, 1993). Maxwell, et al. (1981) espouse 
viewing the use of all measures of magnitude-of-association from the general linear model. As 
previously noted, all of the effect size measures are conceptually similar. A full understanding 
and interpretation of these measures are facilitated by considering their common origin. 

Selection of Effect Estimates 

Numerous researchers have addressed issues regarding what difference it makes in the 
selection and subsequent use of various magnitude-of-association estimates (Knapp, 1978; 
Snyder & Lawson, 1993; Thomas & Nelson, 1996; Thompson, 1991; Thompson, 1994; Uhl & 
Eisenburg, 1970). While no clear cut answer is patently available which would cover all types of 
situations, research has noted that when sample sizes and effects sizes are both large (50-100+), 
biased and unbiased correction formulas appear to produce similar estimate values. When 
sample sizes and effects sizes become smaller, statistical corrections tend to be larger 
(Thompson, 1990). The amount of statistical correction varies dependent upon the formula used 
in the estimate. Snyder and Lawson (1993) concurred with Uhl and Eisenberg (1970) by noting 
that the Lord formula produces the most conservative estimates across all sample sizes. The 
omega squared formula develops more conservative population estimates, and the Lord formula 
develops more conservative sample estimates. Of course, when sample sizes are smaller (< 30), 
most bias corrected formulas produce more conservative estimates than unbiased formulas. 

Conclusion 

Magnitude-of-effect measures are important tools in assisting investigators to develop 
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further insights into the true meaning of their research outcomes. Maxwell, et al (1981), Murray 
and Dosser, (1987), O’Grady, (1982, and Thompson, (1997) discussed the need for researchers to 
understand the appropriate application and interpretation of these indices, so that the most 
productive use of these measures can be utilized. Some general considerations regarding use of 
magnitude of effect indices are offered by Cooper and Hedges (1994), Hedges and Olkin 
(1985), Snyder and Lawson (1993), and Thomas and Nelson (1996). 

1. Generalizations about effect estimates should be limited to studies that involve the 
same levels of variables, and similar numbers/types of subjects. Estimates tend to be very 
context dependent. 

2. Confidence intervals should be constructed for magnitude-of- association measures, 
since these measure may be viewed as point estimates of populations which may not take into 
account sampling error (Fowler, 1985). 

3. While Cohen (1988) proposed general guidelines for assessing the relative size of an 
effect estimate, practical significance must rest with the individual researcher’s judgment, the 
importance of the research questions posed, the design of the study, and ultimately with the true 
societal impact of the findings. Since there are many contextual issues which may effect the 
estimates interpretation, no strict or arbitrary guideline should be employed in making the final 
judgement on practical meaningfulness of a study’s outcome. 
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ACTUAL CALCULATIONS USING 
GLM ANALYSES 



eta squared (R 2 ) = SS^^ / SS total 

= 143.846/1 76.000 = .8 17 

omega 2 (to 2 ) = SS aplained - [(VI) MS em)r ] / SS total + MS em)r 

= 143.846 - [(8-1) 1.891]/ 176.000 + 1.891 
= 130.609/ 177.891 
= .734 



*omega 2 (&> 2 ) = [F ( b k-l)] - (k-1) / [F (k-1)] + (n-k) + 1 

= [10.865(7- 1 )]-7- 1 / [1 0.865(7- 1)]+(25-7) +1 
= 59.190/83.190 
= .712 

Epsilon 2 = SS CTplained - [(v-1) MS eiTOr ] / SS totaI 

= 143.846 -[(8-1) 1.891] / 176.000 
= 130.609/ 176.000 
= .742 

Wherry = l-[( c n-l) / (n-k-1)] (l-/? 2 ) 

= l-[(25-l)/ (25-7-1] (1-.817) 

= 1-(1.412) (.183) 

= .742 

Herzberg = 1 - [(n-l)/(n-k-l)] [(n+k+l)/n] (1-R 2 ) 

= 1 -[24/(25-7-1)] [(25+7+l)/25] (1-.817) 

= 1 -(1.412) (1.32) (.183) 

= 1-.341 
= .659 

*Herzberg = 1 - [(n-l)/(n-k-l)] [(n-2)/(n-k-2)] [(n+l)/n] (1-i? 2 ) 

= 1 -[24/(25-7- 1 )] [(25-2)/(25-7-2)] [(25+1 )/25](. 1 83) 
= 1- (1.412) (1.438) (1.04) (.183) 

= 1 - .386 
= .614 



Lord = 1- (1-tf 2 ) [(n+k+l)/(n-k-l)] 

= 1- (.183) [(25+7+1 )/(25-7-l)] 
= 1- (.183) (1.941) 

= 1-.355 
= .645 



* Random effects model 
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COMPARISON OF EFFECT SIZE COMPUTED ON HEURISTIC DATA 



SUMMARY OF RESULTS 


eta squared ( R 2 ) 


= 


.817 


omega squared (a) 2 ) 


— 


.734 


*omega squared (a) 2 ) 


= 


.712 


*epsilon squared 


= 


.742 


Wherry Formula 


= 


.742 


Herzberg Formula 


= 


.659 


*Herzberg Formula 


= 


.614 


Lord Formula 


= 


.645 
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